Evolving model trees for mining data sets with continuous-valued classes
نویسندگان
چکیده
This paper presents a genetic programming (GP) approach to extract symbolic rules from data sets with continuous-valued classes, called GPMCC. The GPMCC makes use of a genetic algorithm (GA) to evolve multi-variate non-linear models [Potgieter, G., & Engelbrecht, A. (2007). Genetic algorithms for the structural optimisation of learned polynomial expressions. Applied Mathematics and Computation] at the terminal nodes of the GP. Several mechanisms have been developed to optimise the GP, including a fragment pool of candidate non-linear models, k-means clustering of the training data to facilitate the use of stratified sampling methods, and specialized mutation and crossover operators to evolve structurally optimal and accurate models. It is shown that the GPMCC is insensitive to control parameter values. Experimental results show that the accuracy of the GPMCC is comparable to that of NeuroLinear and Cubist, while producing significantly less rules with less complex antecedents. 2007 Elsevier Ltd. All rights reserved.
منابع مشابه
A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining
Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...
متن کاملUniversal Approximation of Interval-valued Fuzzy Systems Based on Interval-valued Implications
It is firstly proved that the multi-input-single-output (MISO) fuzzy systems based on interval-valued $R$- and $S$-implications can approximate any continuous function defined on a compact set to arbitrary accuracy. A formula to compute the lower upper bounds on the number of interval-valued fuzzy sets needed to achieve a pre-specified approximation accuracy for an arbitrary multivariate con...
متن کاملMining hyperintervals Getting to grips with real-valued data
Many uses of data mining, such as clustering, classification, the construction of decision trees, subgroup discovery and itemset mining, often fail to be able to cope with real-valued data well. In fact, it is common for data mining methods to only work well on nominal data with little different values. We build the theory to fill this gap for data from arbitrary uncountable sets and introduce ...
متن کاملA Survey of Information Theory Application on Data Mining
In data mining area, "classification" is one of the most important isses. The approach of decision trees generated is a very useful and reliable solution. For the construction of a decision tree, there are several ways. Among them, Information Theory is a very effective and scalable method. This is a survey project in Information Theory. We focus on the generation of decision tree for classific...
متن کاملData Mining Techniques in Processing Medical Knowledge
Data mining is an evolving and growing area of research and development, both in academia as well as in industry. It involves interdisciplinary research and development encompassing diverse domains. In this age of multimedia data exploration, data mining should no longer be restricted to the mining of knowledge from large volumes of high-dimensional data sets in traditional databases only. The ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Expert Syst. Appl.
دوره 35 شماره
صفحات -
تاریخ انتشار 2008